library(readr)
library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ purrr 0.2.5 ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(urltools)
library(knitr)
library(ggmap)
##
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
##
## wind
library(choroplethr)
## Loading required package: acs
## Loading required package: XML
##
## Attaching package: 'acs'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:base':
##
## apply
library(choroplethrMaps)
library(RColorBrewer)
library(viridis)
## Loading required package: viridisLite
library(urltools)
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
library(tidycensus)
census_api_key("c08a79973dcaf2735a03640e063df308cf920465")
## To install your API key for use in future sessions, run this function with `install = TRUE`.
-Kyaw Za Zaw
-Myles Klapkowski
-Kat Lewis
-Danny Lawrence
-Look for a few more data sources that seem appropriate and add them to the Google Docs
-Try to get current data working in RMarkdown
-Try to create a few data visualizations and share through shared Google Docs for code
-The other person who didn’t make the viz should write a description for it
-Meeting times will be scheduled using www.doodle.com
-Combine the R codes into a single Rmd file, knit it and have it reviewed by all members before submission
Myles: Everyday(11-1) MF(after 4:30) W(4:30-6, after 8) T(3-4:30, 6-9, after 10) R(3-6:30, 8-9, after 10) Sat(1-4:30, 6-8) Sun(before 9pm, after 10pm) Kyaw Za: M (3:40 - 8:00pm), T (5:10 - 8:00pm), W (After 3:40 pm), R(After 5:10pm), F (After 3:40 pm), Sat (9:00am - 12:00pm; After 7pm) Sun(All day) Kat: MWF 12pm-6pm, Tuesday-Thursday 9:30-3; 4:30-6:30. Everyday 9:30pm-11:30pm. Saturday 1pm-5:30; Sunday 1pm-11:30pm Danny: MWF (After 12:00 pm), TR (After 11:30), Sat(All day), Sun(After 2:00pm)
-P4 (Thursday November 20)
-FCC Types of Broadband Connections https://www.fcc.gov/general/types-broadband-connections Used to explain what the different types of connections and how it is important in our area research topic
-State of NY wifi (csv): https://data.ny.gov/api/views/sjc6-ftj4/rows.csv?accessType=DOWNLOAD https://catalog.data.gov/dataset/broadband-availability-by-municipality Data is from 2010 Data includes different municipality types such as town, county, villate (we only want village) Data has population in these municipalities Data has no of households with each type of internet connection in these municipalities
-New York Metro Counties(Reference): http://www.baruch.cuny.edu/nycdata/population-geography/population.htm Used as a reference to see which counties are considered as “metro” Data is as recent as 2017 but still relevant Has population data on the metro counties (but this data not used.)
-US Census Bureau (used with tidyverse) https://www.census.gov/developers/ Includes access to the 5-year American Community Survey APIs. Includes all the data of the US Census Bureau(using the median income in 2010 in this project)
-New York State Population by Counties (csv): https://catalog.data.gov/dataset/annual-population-estimates-for-new-york-state-and-counties-beginning-1970 Population of each county in the New York state from 1970 to 2017 Some of the years are data from the census, some are postcensal estimates Includes FIPS.Code
-Are all households in the state of New York getting equal access to internet? The choloropleth map with rural counties The bar graphs of each household
-How many service providers are in the state of new york in each country and how is it related to population and income? The cholorpleth maps of no. of service providers The cholorpleth maps of population The choloropleth map of income
-Need additional data on these areas: -Price -Location based accessibility (apartment, single-family) -Access to internet with regards to race
Taken from Source: https://www.fcc.gov/general/types-broadband-connections
Digital Subscriber Line (DSL) is a wireline transmission technology that transmits data faster over traditional copper telephone lines already installed to homes and businesses. DSL-based broadband provides transmission speeds ranging from several hundred Kbps to millions of bits per second (Mbps). The availability and speed of your DSL service may depend on the distance from your home or business to the closest telephone company facility.
Cable modem service enables cable operators to provide broadband using the same coaxial cables that deliver pictures and sound to your TV set. They provide transmission speeds of 1.5 Mbps or more.
Fiber optic technology converts electrical signals carrying data to light and sends the light through transparent glass fibers about the diameter of a human hair. Fiber transmits data at speeds far exceeding current DSL or cable modem speeds, typically by tens or even hundreds of Mbps.
Wireless broadband connects a home or business to the Internet using a radio link between the customer’s location and the service provider’s facility. Wireless broadband can be mobile or fixed. Wireless technologies using longer-range directional equipment provide broadband service in remote or sparsely populated areas where DSL or cable modem service would be costly to provide. Speeds are generally comparable to DSL and cable modem. An external antenna is usually required.
#House Wireless Units by County in New York
NY_wifi <- read.csv("https://data.ny.gov/api/views/sjc6-ftj4/rows.csv?accessType=DOWNLOAD")
CountyNY_wifi <- NY_wifi %>% filter(Municipality.Type == "County")
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
data(county.fips)
fips_codes<-separate(county.fips,polyname,c("state","county"),sep=",")
ny_fips_codes<-fips_codes%>%
filter(state=="new york")
CountyNY_wifi<-CountyNY_wifi%>%
mutate(county=tolower(County))%>%
mutate(county=gsub("st. lawrence","st lawrence",county))%>%
left_join(ny_fips_codes,by=c("county"="county"))
CountyNY_wifi <- CountyNY_wifi %>% mutate(region=fips, value=X..Hse.Units.Wireline.1)
CountyNY_wifi6 <- CountyNY_wifi %>% mutate(region=fips, value=X..Wireline.Providers)
county_choropleth(CountyNY_wifi6, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(6,"RdBu"))) + ggtitle("Number of Wireline Providers in each County in the State of New York")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
This map shows the number of wireline providers in each County in the State of New York. There does not seem to be much of a pattern between urban or rural counties in this, nor average household income.
CountyNY_wifi2 <- CountyNY_wifi %>% mutate(region=fips, value=X..Cable.Providers)
county_choropleth(CountyNY_wifi2, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(4,"RdBu"))) + ggtitle("Number of Cable Providers by County")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
In this map, we can see that there is some pattern between the number of cable providers by county and the county’s proximity to urban areas. However, the counties with 4 cable providers are surprising as they do not relate to average household income, but the three counties marked in red do have relatively low populations. Because there are only a maximum of 4 different calbe providers, it is possible that this is a generally less popular/used internet service.
CountyNY_wifi3 <- CountyNY_wifi %>% mutate(region=fips, value=X..of.DSL.Providers)
county_choropleth(CountyNY_wifi3, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(5,"RdBu"))) + ggtitle("Number of DSL Broadband Providers by County")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
In this map the number of DSL providers by county is displayed. In New York City and the boroughs, it is interesting to see that there is only between 1 to 3 DSL broadband providers, whereas less densley populated areas of New York have many more.
CountyNY_wifi4 <- CountyNY_wifi %>% mutate(region=fips, value=X..Fiber.Providers)
county_choropleth(CountyNY_wifi4, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(6,"RdBu"))) + ggtitle("Number of Fiber Broadband Providers by County")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
This map describes the number of fiber broadband providers by county. It appears, from this map, that the counties in New York with lower household incomes have less providers of this kind of internet.
CountyNY_wifi5 <- CountyNY_wifi %>% mutate(region=fips, value=X..Wireless.Providers)
county_choropleth(CountyNY_wifi5, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(5,"RdBu"))) + ggtitle("Number of Wireless Broadband Providers by County")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
According to this mpa, most counties in New York State have between 6 and 7 wireless broadband providers.
Data Source: https://catalog.data.gov/dataset/annual-population-estimates-for-new-york-state-and-counties-beginning-1970
NY_pop <- read.csv("https://data.ny.gov/api/views/krt9-ym2k/rows.csv?accessType=DOWNLOAD")
NY_pop_2010 <- NY_pop %>% filter(Program.Type == "Census Base Population") %>% filter(Year == 2010) %>% filter(FIPS.Code != 36000) %>% mutate(region = FIPS.Code) %>% mutate(value = Population)
county_choropleth(NY_pop_2010, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(7,"RdBu"))) + ggtitle("2010 Census Population By County")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
This shows the 2010 Census population by county in the State of New York.
NY_income_2010 <- get_acs(geography = "county",
variables = c(medincome = "B19013_001"),
state = "NY") %>% mutate(region = as.numeric(GEOID)) %>% mutate(value = estimate)
## Getting data from the 2012-2016 5-year ACS
NY_income_2010 <- NY_income_2010 %>% mutate(NAME=gsub(" County, New York", "", NY_income_2010$NAME))
county_choropleth(NY_income_2010, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(7,"RdBu"))) + ggtitle("2010 Median Household Income By County")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
This map shows the median household income by county from the 2010 Cencus data.
fips_codes_ny <- fips_codes %>% filter(state == "new york")
ny_metro <- tibble(county=c("bronx", "kings", "new york", "queens", "richmond", "nassau", "putnam", "rockland", "suffolk", "westchester"), type=c("metro","metro","metro","metro","metro","metro","metro","metro","metro","metro"))
ny_county_type <- full_join(fips_codes_ny, ny_metro, by = "county")
ny_county_type[is.na(ny_county_type)] <- "rural"
ny_county_type<-ny_county_type%>% mutate(region=fips, value=type)
county_choropleth(ny_county_type, state_zoom="new york") + scale_fill_manual(values = rev(brewer.pal(3,"RdBu"))) + ggtitle("Urban vs. Rural New York Metro Counties")
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
This map distinguishes between counties in New York that are a part of the New York City metro area and those that are defined as rural.
NY_census_rural<- read_csv("county_rural.csv")
## Parsed with column specification:
## cols(
## `2015 GEOID` = col_character(),
## State = col_character(),
## `2015 Geography Name` = col_character(),
## Note = col_integer(),
## `2010 Census Total Population` = col_number(),
## `2010 Census Urban Population` = col_number(),
## `2010 Census Rural Population` = col_number(),
## `2010 Census
## Percent Rural` = col_double()
## )
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 2)
## Warning: 1 parsing failure.
## row # A tibble: 1 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1 <NA> 8 columns 1 columns 'county_rural.csv' file # A tibble: 1 x 5
NY_census_rural<- NY_census_rural %>%
filter(State == "NY") %>%
select(-Note)
NewNYCounty<- CountyNY_wifi %>%
left_join(NY_census_rural, by=c("X2010.Muni.Population"="2010 Census Total Population"))
Total_NY_Wifi<- NewNYCounty %>%
left_join(NY_income_2010, by=c("County"="NAME"))
## Warning: Column `County`/`NAME` joining factor and character vector,
## coercing into character vector
Trimmed_NY_Wifi <- Total_NY_Wifi %>%
select(-GNIS.ID, -Municipality.Name, -Municipality.Type, -county, -GEOID, -`2015 Geography Name`, -state, -`2015 GEOID`, -variable)
colnames(Trimmed_NY_Wifi)[1] <- "Total_Population"
colnames(Trimmed_NY_Wifi)[2] <- "Total_Housing_Units"
colnames(Trimmed_NY_Wifi)[3] <- "Municipal_Area"
colnames(Trimmed_NY_Wifi)[5] <- "REDC_Region"
colnames(Trimmed_NY_Wifi)[6] <- "Cable_Providers"
colnames(Trimmed_NY_Wifi)[7] <- "Units_Cable"
colnames(Trimmed_NY_Wifi)[8] <- "Percent_Cable"
colnames(Trimmed_NY_Wifi)[9] <- "DSL_Providers"
colnames(Trimmed_NY_Wifi)[10] <- "Units_DSL"
colnames(Trimmed_NY_Wifi)[11] <- "Percent_DSL"
colnames(Trimmed_NY_Wifi)[12] <- "Fiber_Providers"
colnames(Trimmed_NY_Wifi)[13] <- "Units_Fiber"
colnames(Trimmed_NY_Wifi)[14] <- "Percent_Fiber"
colnames(Trimmed_NY_Wifi)[15] <- "Wireline_Providers"
colnames(Trimmed_NY_Wifi)[16] <- "Units_Wireline"
colnames(Trimmed_NY_Wifi)[17] <- "Percent_Wireline"
colnames(Trimmed_NY_Wifi)[18] <- "Wireless_Providers"
colnames(Trimmed_NY_Wifi)[19] <- "Units_Wireless"
colnames(Trimmed_NY_Wifi)[20] <- "Percent_Wireless"
colnames(Trimmed_NY_Wifi)[21] <- "Sat_Providers"
colnames(Trimmed_NY_Wifi)[26] <- "Urban_Pop"
colnames(Trimmed_NY_Wifi)[27] <- "Rural_Pop"
colnames(Trimmed_NY_Wifi)[28] <- "Percent_Rural_Pop"
Total_NY_Wifi <- Trimmed_NY_Wifi
cablevcounty <- ggplot(Total_NY_Wifi, aes(x = reorder(County, -Percent_Cable), y=Percent_Cable, fill=Percent_Rural_Pop)) +
geom_col(alpha=1, position = position_stack()) + theme(axis.text.x = element_text(angle = 65, hjust = 1)) + labs(title= "% of Households With Cable vs. County Name", x="County Name (NY)", y="% of Households With Cable")
ggplotly(cablevcounty)
This visualization displays the percentage of households with cable per county in New York state. Metro counties generally have a higher percentage of households with cable (as illustrated by the left end of the x-axis) when compared to rural counties (shown on the right side of the x-axis)
DSLvcounty <- ggplot(Total_NY_Wifi, aes(x = reorder(County, -Percent_DSL), y=Percent_DSL, fill=Percent_Rural_Pop)) +
geom_col(alpha=.8, position = position_stack()) + theme(axis.text.x = element_text(angle = 65, hjust = 1)) + labs(title= "% of Households With DSL vs. County Name", x="County Name (NY)", y="% of Households With DSL")
ggplotly(DSLvcounty)
This visualization illustrates the percentage of households per county that have DSL in New York state. This chart shows the same trend as the previous, the metro county households have a tendency to have more access to DSL than households in rural counties.
Fibervcounty<- ggplot(Total_NY_Wifi, aes(x = reorder(County, -Percent_Fiber), y=Percent_Fiber, fill=Percent_Rural_Pop)) +
geom_col(alpha=1, position = position_stack()) + theme(axis.text.x = element_text(angle = 65, hjust = 1)) + labs(title= "% of Households With Fiber vs. County Name", x="County Name (NY)", y="% of Households With Fiber")
ggplotly(Fibervcounty)
This visualization depicts the percentage of households in New York state counties with access to fiber, a super speed internet service offered by providers in select locations. Fiber is limited almost exclusively to metro counties in New York state.
Wirelinevcounty <- ggplot(Total_NY_Wifi, aes(x = reorder(County, -Percent_Wireline), y=Percent_Wireline, fill=Percent_Rural_Pop)) +
geom_col(alpha=1, position = position_stack()) + theme(axis.text.x = element_text(angle = 65, hjust = 1)) + labs(title= "% of Households With Wireline vs. County Name", x="County Name (NY)", y="% of Households With Wireline")
ggplotly(Wirelinevcounty)
This visualization shows the percentage of households per county in New York state that have Wireline access (DSL, Cable, or Fiber). The results across the board, from metro to rural, are fairly similar; but the percentage in rural counties is slightly lower.
Wirelessvcounty <- ggplot(Total_NY_Wifi, aes(x = reorder(County, -Percent_Wireless), y=Percent_Wireless, fill=Percent_Rural_Pop)) +
geom_col(alpha=.8, position = position_stack()) + theme(axis.text.x = element_text(angle = 65, hjust = 1)) + labs(title= "% of Households With Wireless vs. County Name", x="County Name (NY)", y="% of Households With Wireless")
ggplotly(Wirelessvcounty)
This visualization shows the percentage of households per county in New York state that have access to Wireless services provided through the use of satellites. The results in both metro and rural counties are fairly homogeneous.
-Kyaw Za looked up data sources, did the API calls, and did the data wrangling.
-Kat Lewis did the map visualizations and some descriptions; contributed to data wrangling to make maps.
-Myles Klapowski made the bar graph visualizations and looked up data sources.
-Danny Wellman wrote the descriptions and did the project formatting.